Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 83
Filtrar
1.
Gigascience ; 132024 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-38573185

RESUMO

BACKGROUND: Culture-free real-time sequencing of clinical metagenomic samples promises both rapid pathogen detection and antimicrobial resistance profiling. However, this approach introduces the risk of patient DNA leakage. To mitigate this risk, we need near-comprehensive removal of human DNA sequences at the point of sequencing, typically involving the use of resource-constrained devices. Existing benchmarks have largely focused on the use of standardized databases and largely ignored the computational requirements of depletion pipelines as well as the impact of human genome diversity. RESULTS: We benchmarked host removal pipelines on simulated and artificial real Illumina and Nanopore metagenomic samples. We found that construction of a custom kraken database containing diverse human genomes results in the best balance of accuracy and computational resource usage. In addition, we benchmarked pipelines using kraken and minimap2 for taxonomic classification of Mycobacterium reads using standard and custom databases. With a database representative of the Mycobacterium genus, both tools obtained improved specificity and sensitivity, compared to the standard databases for classification of Mycobacterium tuberculosis. Computational efficiency of these custom databases was superior to most standard approaches, allowing them to be executed on a laptop device. CONCLUSIONS: Customized pangenome databases provide the best balance of accuracy and computational efficiency when compared to standard databases for the task of human read removal and M. tuberculosis read classification from metagenomic samples. Such databases allow for execution on a laptop, without sacrificing accuracy, an especially important consideration in low-resource settings. We make all customized databases and pipelines freely available.


Assuntos
Mycobacterium tuberculosis , Humanos , Mycobacterium tuberculosis/genética , Benchmarking , Bases de Dados Factuais , Genoma Humano , Metagenoma
2.
BMC Bioinformatics ; 25(Suppl 1): 153, 2024 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-38627615

RESUMO

BACKGROUND: With the rapid increase in throughput of long-read sequencing technologies, recent studies have explored their potential for taxonomic classification by using alignment-based approaches to reduce the impact of higher sequencing error rates. While alignment-based methods are generally slower, k-mer-based taxonomic classifiers can overcome this limitation, potentially at the expense of lower sensitivity for strains and species that are not in the database. RESULTS: We present MetageNN, a memory-efficient long-read taxonomic classifier that is robust to sequencing errors and missing genomes. MetageNN is a neural network model that uses short k-mer profiles of sequences to reduce the impact of distribution shifts on error-prone long reads. Benchmarking MetageNN against other machine learning approaches for taxonomic classification (GeNet) showed substantial improvements with long-read data (20% improvement in F1 score). By utilizing nanopore sequencing data, MetageNN exhibits improved sensitivity in situations where the reference database is incomplete. It surpasses the alignment-based MetaMaps and MEGAN-LR, as well as the k-mer-based Kraken2 tools, with improvements of 100%, 36%, and 23% respectively at the read-level analysis. Notably, at the community level, MetageNN consistently demonstrated higher sensitivities than the previously mentioned tools. Furthermore, MetageNN requires < 1/4th of the database storage used by Kraken2, MEGAN-LR and MMseqs2 and is > 7× faster than MetaMaps and GeNet and > 2× faster than MEGAN-LR and MMseqs2. CONCLUSION: This proof of concept work demonstrates the utility of machine-learning-based methods for taxonomic classification using long reads. MetageNN can be used on sequences not classified by conventional methods and offers an alternative approach for memory-efficient classifiers that can be optimized further.


Assuntos
Metagenômica , Viverridae , Animais , Metagenômica/métodos , Redes Neurais de Computação , Metagenoma , Aprendizado de Máquina , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos
3.
Microbiol Resour Announc ; 13(4): e0106323, 2024 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-38436268

RESUMO

The RDP Classifier is one of the most popular machine learning approaches for taxonomic classification due to its robustness and relatively high accuracy. Both the RDP taxonomy and RDP Classifier have been updated to incorporate newly described taxa and recent changes to prokaryotic nomenclature.

4.
Biochem Genet ; 2024 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-38285123

RESUMO

Asthma is a multifactorial disease with phenotypes and several clinical and pathophysiological characteristics. Besides innate and adaptive immune responses, the gut microbiome generates Treg cells, mediating the allergic response to environmental factors and exposure to allergens. Because of the complexity of asthma, microbiome analysis and other precision medicine methods are now widely regarded as essential elements of efficient disease therapy. An in-silico pipeline enables the comparative taxonomic profiling of 16S rRNA metagenomic profiles of 20 asthmatic patients and 15 healthy controls utilizing QIIME2. Further, PICRUSt supports downstream gene enrichment and pathway analysis, inferring the enriched pathways in a diseased state. A significant abundance of the phylum Proteobacteria, Sutterella, and Megamonas is identified in asthma patients and a diminished genus Akkermansia. Nasal samples reveal a high relative abundance of Mycoplasma in the nasal samples. Further, differential functional profiling identifies the metabolic pathways related to cofactors and amino acids, secondary metabolism, and signaling pathways. These findings support that a combination of bacterial communities is involved in mediating the responses involved in chronic respiratory conditions like asthma by exerting their influence on various metabolic pathways.

5.
mSystems ; 9(2): e0103923, 2024 Feb 20.
Artigo em Inglês | MEDLINE | ID: mdl-38275296

RESUMO

Specific bacterial species have been found to play important roles in human vagina. Achieving high species-level resolution is vital for analyzing vaginal microbiota data. However, contradictory conclusions were yielded from different methodological studies. More comprehensive evaluation is needed for determining an optimal pipeline for vaginal microbiota. Based on the sequences of vaginal bacterial species downloaded from NCBI, we conducted simulated amplification with various primer sets targeting different 16S regions as well as taxonomic classification on the amplicons applying different combinations of algorithms (BLAST+, VSEARCH, and Sklearn) and reference databases (Greengenes2, SILVA, and RDP). Vaginal swabs were collected from participants with different vaginal microecology to construct 16S full-length sequenced mock communities. Both computational and experimental amplifications were performed on the mock samples. Classification accuracy of each pipeline was determined. Microbial profiles were compared between the full-length and partial 16S sequencing samples. The optimal pipeline was further validated in a multicenter cohort against the PCR results of common STI pathogens. Pipeline V1-V3_Sklearn_Combined had the highest accuracy for classifying the amplicons generated from both the NCBI downloaded data (84.20% ± 2.39%) and the full-length sequencing data (95.65% ± 3.04%). Vaginal samples amplified and sequenced targeting the V1-V3 region but merely employing the forward reads (223 bp) and classified using the optimal pipeline, resembled the mock communities the most. The pipeline demonstrated high F1-scores for detecting STI pathogens within the validation cohort. We have determined an optimal pipeline to achieve high species-level resolution for vaginal microbiota with short amplicons, which will facilitate future studies.IMPORTANCEFor vaginal microbiota studies, diverse 16S rRNA gene regions were applied for amplification and sequencing, which affect the comparability between different studies as well as the species-level resolution of taxonomic classification. We conducted comprehensive evaluation on the methods which influence the accuracy for the taxonomic classification and established an optimal pipeline to achieve high species-level resolution for vaginal microbiota with short amplicons, which will facilitate future studies.


Assuntos
Microbiota , Infecções Sexualmente Transmissíveis , Feminino , Humanos , RNA Ribossômico 16S/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Filogenia , Microbiota/genética , Vagina/microbiologia , Bactérias
6.
MethodsX ; 11: 102444, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37920873

RESUMO

During the last years, the application of next-generation sequencing (NGS) technologies to search for specific genetic markers has become a crucial method for the characterization of microbial communities. Illumina MiSeq, likely the most widespread NGS platform for metabarcoding experiments and taxonomic classification, allows processing shorter reads than the classical SANGER sequencing method and therefore requires specific primer pairs that produce shorter amplicons. Specifically, for the analysis of the commonly studied Prochlorococcus and Synechococcus communities, the petB marker gene has recently stood out as able to provide deep coverage to determine the microdiversity of the community. However, current petB primer set produce a 597 bp amplicon that is not suitable for MiSeq chemistry. Here, we designed and tested a petB primer pair that targets both Prochlorococcus and Synechococcus communities producing an appropriate amplicon to be used with state-of-the-art Illumina MiSeq. This new primer set allows the classification of both groups to a low taxonomic level and is therefore suitable for high throughput experiments using MiSeq technologies, therefore constituting a useful, novel tool to facilitate further studies on Prochlorococcus and Synechococcus communities. •This work describes the de novo design of a Prochlorococcus and Synechococcus-specific petB primer pair, allowing the characterization of both populations to a low taxonomic level.•This primer pair is suitable for widespread Illumina MiSeq sequencing technologies.•petB was confirmed as an adequate target for the characterization of both picocyanobacteria.

7.
Front Microbiol ; 14: 1273462, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37795299

RESUMO

[This corrects the article DOI: 10.3389/fmicb.2023.1199843.].

8.
Foods ; 12(19)2023 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-37835358

RESUMO

Vinegar is one of the most appreciated fermented foods in European and Asian countries. In industry, its elaboration depends on numerous factors, including the nature of starter culture and raw material, as well as the production system and operational conditions. Furthermore, vinegar is obtained by the action of acetic acid bacteria (AAB) on an alcoholic medium in which ethanol is transformed into acetic acid. Besides the highlighted oxidative metabolism of AAB, their versatility and metabolic adaptability make them a taxonomic group with several biotechnological uses. Due to new and rapid advances in this field, this review attempts to approach the current state of knowledge by firstly discussing fundamental aspects related to industrial vinegar production and then exploring aspects related to AAB: classification, metabolism, and applications. Emphasis has been placed on an exhaustive taxonomic review considering the progressive increase in the number of new AAB species and genera, especially those with recognized biotechnological potential.

9.
Emerg Infect Dis ; 29(9): 1941-1944, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37610155

RESUMO

We report a sequencing protocol and 121-kb poxvirus sequence from a clinical sample from a horse in Finland with dermatitis. Based on phylogenetic analyses, the virus is a novel parapoxvirus associated with a recent epidemic; previous data suggest zoonotic potential. Increased awareness of this virus and specific diagnostic protocols are needed.


Assuntos
Doenças Transmissíveis , Parapoxvirus , Poxviridae , Cavalos , Animais , Parapoxvirus/genética , Finlândia/epidemiologia , Filogenia
10.
Front Microbiol ; 14: 1199843, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37593543

RESUMO

Introduction: Temperate phages can engage in the horizontal transfer of functional genes to their bacterial hosts. Thus, their genetic material becomes an intimate part of bacterial genomes and plays essential roles in bacterial mutation and evolution. Specifically, temperate phages can naturally transmit genes by integrating their genomes into the bacterial host genomes via integrases. Our previous study showed that Salmonella enterica contains the largest number of temperate phages among all publicly available bacterial species. S. enterica is an important pathogen that can cause serious systemic infections and even fatalities. Methods: Initially, we extracted all S. enterica temperate phages from the extensively developed temperate phage database established in our previous study. Subsequently, we conducted an in-depth analysis of the genetic characteristics and integration specificity exhibited by these S. enterica temperate phages. Results: Here we identified 8,777 S. enterica temperate phages, all of which have integrases in their genomes. We found 491 non-redundant S. enterica temperate phage integrases (integrase entries). S. enterica temperate phage integrases were classified into three types: intA, intS, and phiRv2. Correlation analysis showed that the sequence lengths of S. enterica integrase and core regions of attB and attP were strongly correlated. Further phylogenetic analysis and taxonomic classification indicated that both the S. enterica temperate phage genomes and the integrase gene sequences were of high diversities. Discussion: Our work provides insight into the essential integration specificity and genetic diversity of S. enterica temperate phages. This study paves the way for a better understanding of the interactions between phages and S. enterica. By analyzing a large number of S. enterica temperate phages and their integrases, we provide valuable insights into the genetic diversity and prevalence of these elements. This knowledge has important implications for developing targeted therapeutic interventions, such as phage therapy, to combat S. enterica infections. By harnessing the lytic capabilities of temperate phages, they can be engineered or utilized in phage cocktails to specifically target and eradicate S. enterica strains, offering an alternative or complementary approach to traditional antibiotic treatments. Our study has implications for public health and holds potential significance in combating clinical infections caused by S. enterica.

11.
BMC Ecol Evol ; 23(1): 27, 2023 06 28.
Artigo em Inglês | MEDLINE | ID: mdl-37370016

RESUMO

BACKGROUND: Ictalurus is one of the most representative groups of North American freshwater fishes. Although this group has a well-studied fossil record and has been the subject of several morphological and molecular phylogenetic studies, incomplete taxonomic sampling and insufficient taxonomic studies have produced a rather complex classification, along with intricate patterns of evolutionary history in the genus that are considered unresolved and remain under debate. RESULTS: Based on four loci and the most comprehensive taxonomic sampling analyzed to date, including currently recognized species, previously synonymized species, undescribed taxa, and poorly studied populations, this study produced a resolved phylogenetic framework that provided plausible species delimitation and an evolutionary time framework for the genus Ictalurus. CONCLUSIONS: Our phylogenetic hypothesis revealed that Ictalurus comprises at least 13 evolutionary units, partially corroborating the current classification and identifying populations that emerge as putative undescribed taxa. The divergence times of the species indicate that the diversification of Ictalurus dates to the early Oligocene, confirming its status as one of the oldest genera within the family Ictaluridae.


Assuntos
Peixes-Gato , Ictaluridae , Animais , Filogenia , Ictaluridae/genética , Peixes-Gato/genética , Evolução Biológica
12.
Life (Basel) ; 13(5)2023 Apr 28.
Artigo em Inglês | MEDLINE | ID: mdl-37240753

RESUMO

Microbial degradation of aromatic hydrocarbons is an emerging technology, and it is well recognized for its economic methods, efficiency, and safety; however, its exploration is still scarce and greater emphasis on cyanobacteria-bacterial mutualistic interactions is needed. We evaluated and characterized the phenanthrene biodegradation capacity of consortium dominated by Fischerella sp. under holoxenic conditions with aerobic heterotrophic bacteria and their molecular identification through 16S rRNA Illumina sequencing. Results indicated that our microbial consortium can degrade up to 92% of phenanthrene in five days. Bioinformatic analyses revealed that consortium was dominated by Fischerella sp., however different members of Nostocaceae and Weeksellaceae, as well as several other bacteria, such as Chryseobacterium, and Porphyrobacter, were found to be putatively involved in the biological degradation of phenanthrene. This work contributes to a better understanding of biodegradation of phenanthrene by cyanobacteria and identify the microbial diversity related.

14.
Microb Genom ; 9(3)2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36867161

RESUMO

In metagenomic analyses of microbiomes, one of the first steps is usually the taxonomic classification of reads by comparison to a database of previously taxonomically classified genomes. While different studies comparing metagenomic taxonomic classification methods have determined that different tools are 'best', there are two tools that have been used the most to-date: Kraken (k-mer-based classification against a user-constructed database) and MetaPhlAn (classification by alignment to clade-specific marker genes), the latest versions of which are Kraken2 and MetaPhlAn 3, respectively. We found large discrepancies in both the proportion of reads that were classified as well as the number of species that were identified when we used both Kraken2 and MetaPhlAn 3 to classify reads within metagenomes from human-associated or environmental datasets. We then investigated which of these tools would give classifications closest to the real composition of metagenomic samples using a range of simulated and mock samples and examined the combined impact of tool-parameter-database choice on the taxonomic classifications given. This revealed that there may not be a one-size-fits-all 'best' choice. While Kraken2 can achieve better overall performance, with higher precision, recall and F1 scores, as well as alpha- and beta-diversity measures closer to the known composition than MetaPhlAn 3, the computational resources required for this may be prohibitive for many researchers, and the default database and parameters should not be used. We therefore conclude that the best tool-parameter-database choice for a particular application depends on the scientific question of interest, which performance metric is most important for this question and the limit of available computational resources.


Assuntos
Metagenoma , Microbiota , Humanos , Bases de Dados Factuais , Metagenômica
15.
Environ Microbiome ; 18(1): 16, 2023 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-36890583

RESUMO

We present here POSMM (pronounced 'Possum'), Python-Optimized Standard Markov Model classifier, which is a new incarnation of the Markov model approach to metagenomic sequence analysis. Built on the top of a rapid Markov model based classification algorithm SMM, POSMM reintroduces high sensitivity associated with alignment-free taxonomic classifiers to probe whole genome or metagenome datasets of increasingly prohibitive sizes. Logistic regression models generated and optimized using the Python sklearn library, transform Markov model probabilities to scores suitable for thresholding. Featuring a dynamic database-free approach, models are generated directly from genome fasta files per run, making POSMM a valuable accompaniment to many other programs. By combining POSMM with ultrafast classifiers such as Kraken2, their complementary strengths can be leveraged to produce higher overall accuracy in metagenomic sequence classification than by either as a standalone classifier. POSMM is a user-friendly and highly adaptable tool designed for broad use by the metagenome scientific community.

16.
Front Microbiol ; 14: 1240957, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38235435

RESUMO

Introduction: A common task in the analysis of microbial communities involves assigning taxonomic labels to the sequences derived from organisms found in the communities. Frequently, such labels are assigned using machine learning algorithms that are trained to recognize individual taxonomic groups based on training data sets that comprise sequences with known taxonomic labels. Ideally, the training data should rely on labels that are experimentally verified-formal taxonomic labels require knowledge of physical and biochemical properties of organisms that cannot be directly inferred from sequence alone. However, the labels associated with sequences in biological databases are most commonly computational predictions which themselves may rely on computationally-generated data-a process commonly referred to as "transitive annotation." Methods: In this manuscript we explore the implications of training a machine learning classifier (the Ribosomal Database Project's Bayesian classifier in our case) on data that itself has been computationally generated. We generate new training examples based on 16S rRNA data from a metagenomic experiment, and evaluate the extent to which the taxonomic labels predicted by the classifier change after re-training. Results: We demonstrate that even a few computationally-generated training data points can significantly skew the output of the classifier to the point where entire regions of the taxonomic space can be disturbed. Discussion and conclusions: We conclude with a discussion of key factors that affect the resilience of classifiers to transitively-annotated training data, and propose best practices to avoid the artifacts described in our paper.

17.
BMC Genomics ; 23(1): 835, 2022 Dec 16.
Artigo em Inglês | MEDLINE | ID: mdl-36526963

RESUMO

BACKGROUND: Despite the applications of Bacillus subtilis group species in various sectors, limited information is available regarding their phages. Here, 61 B. subtilis group species-infecting phages (BSPs) were studied for their taxonomic classification considering the genome-size, genomic diversity, and the host, followed by the identification of orthologs taxonomic signature genes. RESULTS: BSPs have widely ranging genome sizes that can be bunched into groups to demonstrate correlations to family and subfamily classifications. Comparative analysis re-confirmed the existing, BSPs-containing 14 genera and 21 species and displayed inter-genera similarities within existing subfamilies. Importantly, it also revealed the need for the creation of new taxonomic classifications, including 28 species, nine genera, and two subfamilies (New subfamily1 and New subfamily2) to accommodate inter-genera relatedness. Following pangenome analysis, no ortholog shared by all BSPs was identified, while orthologs, namely, the tail fibers/spike proteins and poly-gamma-glutamate hydrolase, that are shared by more than two-thirds of the BSPs were identified. More importantly, major capsid protein (MCP) type I, MCP type II, MCP type III and peptidoglycan binding proteins that are distinctive orthologs for Herelleviridae, Salasmaviridae, New subfamily1, and New subfamily2, respectively, were identified and analyzed which could serve as signatures to distinguish BSP members of the respective taxon. CONCLUSIONS: In this study, we show the genomic diversity and propose a comprehensive classification of 61 BSPs, including the proposition for the creation of two new subfamilies, followed by the identification of orthologs taxonomic signature genes, potentially contributing to phage taxonomy.


Assuntos
Bacillus , Bacteriófagos , Bacteriófagos/genética , Bacillus/genética , Bacillus subtilis/genética , Genômica , Genoma Viral , Filogenia
18.
Rev. biol. trop ; 70(1)dic. 2022.
Artigo em Espanhol | LILACS, SaludCR | ID: biblio-1387722

RESUMO

Resumen Introducción: La diversidad de una comunidad biológica es el resultado de procesos ecológicos e históricos, los cuales, analizados en conjunto, producen una mejor comprensión de las causas que la generan. Objetivo: Actualizamos y analizamos la diversidad específica y taxonómica de la ictiofauna del río Amacuzac, México. Métodos: Durante cinco temporadas de muestreo (2019-2020), recolectamos peces de diez sitios en el río y aplicamos un análisis de conglomerados a las variables del hábitat. Resultados: Recolectamos 7 638 individuos, siete de especies nativas y nueve no nativas, incluyendo Copadichromis borleyi, un nuevo registro para el Amacuzac. La riqueza por sitio osciló entre ocho y 13 especies. Las variables del hábitat definieron cuatro grupos. Las especies más abundantes fueron: Poeciliopsis gracilis, Poecilia maylandi y Amatitlania nigrofasciata. Las especies menos abundantes fueron: Pterygoplichtys pardalis, Ilyodon whitei, Copadichromis borleyi e Ictalurus punctatus. Las especies más relevantes fueron: A. nigrofasciata, Amphilophus istlanus, Andinoacara rivulatus, Notropis boucardi, Oreochormis sp., P. maylandi, P. gracilis y Thorichthys maculipinis. Las especies más restringidas fueron: Atherinella balsana, C. borleyi e I. punctatus. Conclusiones: Las especies en peligro de extinción, A. istlanus y N. boucardi, aun prevalecen en el río. Además, se muestra un aumento en el número de especies no nativas. Analizar la diversidad desde dos perspectivas, aporta una visión más completa de los cambios que se dan en el Río Amacuzac como consecuencia del establecimiento de especies, información que es importante para futuras estrategias de conservación.


Abstract Introduction: The diversity of a biological community is the result of ecological and historical processes, which, when analyzed jointly, produce a better understanding of the causes that generate it. Objective: We update and analyze the specific and taxonomic diversity of the ichthyofauna of the Amacuzac River, Mexico. Methods: During five sampling seasons (2019-2020) we collected fishes from ten sites in the river and applied a cluster analysis to habitat variables. Results: We collected 7 638 individuals; seven were native species and nine were non-native, including Copadichromis borleyi, a new record for the Amacuzac. Richness per site ranged from eight to 13 species. Habitat variables defined four groups. The most abundant species were Poeciliopsis gracilis, Poecilia maylandi and Amatitlania nigrofasciata. The least abundant species were: Pterygoplichtys pardalis, Ilyodon whitei, Copadichromis borleyi and Ictalurus punctatus. The most prevalent species were: A. nigrofasciata, Amphilophus istlanus, Andinoacara rivulatus, Notropis boucardi, Oreochormis sp., P. maylandi, P., gracilis and Thorichthys maculipinis. The most restricted species were: Atherinella balsana, C. borleyi and I. punctatus. Conclusions: Endangered species such as A. istlanus and N. boucardi are still prevalent in the river, but non-native species continue to increase. Analyzing the diversity from two perspectives provides a more complete view of the changes taking place in the Amacuzac River as a consequence of species establishment, information that is important for future conservation strategies.


Assuntos
Animais , Fauna Aquática , Rios , Biodiversidade , México
19.
PeerJ ; 10: e14292, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36389404

RESUMO

As the size of reference sequence databases and high-throughput sequencing datasets continue to grow, it is becoming computationally infeasible to use traditional alignment to large genome databases for taxonomic classification of metagenomic reads. Exact matching approaches can rapidly assign taxonomy and summarize the composition of microbial communities, but they sacrifice accuracy and can lead to false positives. Full alignment tools provide higher confidence assignments and can assign sequences from genomes that diverge from reference sequences; however, full alignment tools are computationally intensive. To address this, we designed MTSv specifically for alignment-based taxonomic assignment in metagenomic analysis. This tool implements an FM-index assisted q-gram filter and SIMD accelerated Smith-Waterman algorithm to find alignments. However, unlike traditional aligners, MTSv will not attempt to make additional alignments to a TaxID once an alignment of sufficient quality has been found. This improves efficiency when many reference sequences are available per taxon. MTSv was designed to be flexible and can be modified to run on either memory or processor constrained systems. Although MTSv cannot compete with the speeds of exact k-mer matching approaches, it is reasonably fast and has higher precision than popular exact matching approaches. Because MTSv performs a full alignment it can classify reads even when the genomes share low similarity with reference sequences and provides a tool for high confidence pathogen detection with low off-target assignments to near neighbor species.


Assuntos
Algoritmos , Metagenoma , Análise de Sequência de DNA , Metagenoma/genética , Bases de Dados de Ácidos Nucleicos , Metagenômica
20.
Microb Genom ; 8(10)2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36269282

RESUMO

Culture-independent metagenomic detection of microbial species has the potential to provide rapid and precise real-time diagnostic results. However, it is potentially limited by sequencing and taxonomic classification errors. We use simulated and real-world data to benchmark rates of species misclassification using 100 reference genomes for each of the ten common bloodstream pathogens and six frequent blood-culture contaminants (n=1568, only 68 genomes were available for Micrococcus luteus). Simulating both with and without sequencing error for both the Illumina and Oxford Nanopore platforms, we evaluated commonly used classification tools including Kraken2, Bracken and Centrifuge, utilizing mini (8 GB) and standard (30-50 GB) databases. Bracken with the standard database performed best, the median percentage of reads across both sequencing platforms identified correctly to the species level was 97.8% (IQR 92.7:99.0) [range 5:100]. For Kraken2 with a mini database, a commonly used combination, median species-level identification was 86.4% (IQR 50.5:93.7) [range 4.3:100]. Classification performance varied by species, with Escherichia coli being more challenging to classify correctly (probability of reads being assigned to the correct species: 56.1-96.0%, varying by tool used). Human read misclassification was negligible. By filtering out shorter Nanopore reads we found performance similar or superior to Illumina sequencing, despite higher sequencing error rates. Misclassification was more common when the misclassified species had a higher average nucleotide identity to the true species. Our findings highlight taxonomic misclassification of sequencing data occurs and varies by sequencing and analysis workflow. To account for 'bioinformatic contamination' we present a contamination catalogue that can be used in metagenomic pipelines to ensure accurate results that can support clinical decision making.


Assuntos
Nanoporos , Humanos , Benchmarking/métodos , Metagenômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Nucleotídeos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...